Named-Entity-based Linking and Exploration of News Using an Adapted Jaccard Metric

نویسندگان

  • Tom De Nies
  • Jasper Verplanken
  • Ruben Verborgh
  • Wesley De Neve
  • Erik Mannens
  • Rik Van de Walle
چکیده

In this paper, we propose a semantically enabled news exploration method to aid journalists in overcoming the information overload in today’s news streams. To achieve this, our approach semantically tags news articles, calculates their relatedness through their similarity based on these tags, and creates an article graph to be browsed by an end-user. Based on related work, the Jaccard metric seemed very suitable for this task. However, when we evaluated this similarity measure through crowdsourcing on a set of 120 article pairs, the results were only acceptable in the lower levels of relatedness, with unpredictable errors elsewhere. This reveals a need for better ground-truth data, and calls for clarification of the semantics of relatedness and similarity, and their relation.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Ghent University-iMinds at MediaEval 2013: An Unsupervised Named Entity-based Similarity Measure for Search and Hyperlinking

In this paper, we describe our approach to the Search and Hyperlinking task at the MediaEval 2013 benchmark. This task focuses on video retrieval and linking in the context of a large and rich dataset provided by the BBC. Our approach makes use of one of three types of audio transcripts, enriched with Named Entities. To compute similarity, we adapt the Jaccard metric to use Named Entities. This...

متن کامل

PAYMA: A Tagged Corpus of Persian Named Entities

The goal in the named entity recognition task is to classify proper nouns of a piece of text into classes such as person, location, and organization. Named entity recognition is an important preprocessing step in many natural language processing tasks such as question-answering and summarization. Although many research studies have been conducted in this area in English and the state-of-the-art...

متن کامل

Expanding the horizons: adding a new language to the news personalization system

News360 is the news aggregation system with personalization. Initially created for English, it was recently adapted for German. In this paper, we show that it is possible to adapt such systems automatically, without any manual labour, using only open knowledge bases and Wikipedia dumps. We propose a method for adaptation named entity linking and classification to target language. We show that e...

متن کامل

Combining Multiple Signals for Semanticizing Tweets: University of Amsterdam at #Microposts2015

In this paper we present an approach for extracting and linking entities from short and noisy microblog posts. We describe a diverse set of approaches based on the Semanticizer, an open-source entity linking framework developed at the University of Amsterdam, adapted to the task of the #Microposts2015 challenge. We consider alternatives for dealing with ambiguity that can help in the named enti...

متن کامل

A Novel Approach to Conditional Random Field-based Named Entity Recognition using Persian Specific Features

Named Entity Recognition is an information extraction technique that identifies name entities in a text. Three popular methods have been conventionally used namely: rule-based, machine-learning-based and hybrid of them to extract named entities from a text. Machine-learning-based methods have good performance in the Persian language if they are trained with good features. To get good performanc...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015